Greedy expansions in convex optimization 



V.N. Temlyakov 
March 6, 2013 



Abstract 

This paper is a follow up to the previous author's paper on convex 
optimization. In that paper we began the process of adjusting greedy- 
type algorithms from nonlinear approximation for finding sparse solu- 
tions of convex optimization problems. We modified there three the 
most popular in nonlinear approximation in Banach spaces greedy 
algorithms - Weak Chebyshev Greedy Algorithm, Weak Greedy Al- 
gorithm with Free Relaxation and Weak Relaxed Greedy Algorithm - 
for solving convex optimization problems. We continue to study sparse 
approximate solutions to convex optimization problems. It is known 
that in many engineering applications researchers are interested in 
an approximate solution of an optimization problem as a linear com- 
bination of elements from a given system of elements. There is an 
increasing interest in building such sparse approximate solutions us- 
ing different greedy-type algorithms. In this paper we concentrate on 
greedy algorithms that provide expansions, which means that the ap- 
proximant at the mth iteration is equal to the sum of the approximant 
from the previous iteration ((m— l)th iteration) and one element from 
the dictionary with an appropriate coefficient. The problem of greedy 
expansions of elements of a Banach space is well studied in nonlinear 
approximation theory. At a first glance the setting of a problem of ex- 
pansion of a given element and the setting of the problem of expansion 
in an optimization problem are very different. However, it turns out 
that the same technique can be used for solving both problems. We 
show how the technique developed in nonlinear approximation the- 
ory, in particular, the greedy expansions technique can be adjusted 
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for finding a sparse solution of an optimization problem given by an 
expansion with respect to a given dictionary. 



1 Introduction 

This paper is a follow up to the author's paper [13] on convex optimiza- 
tion. In [T3] we began the process of adjusting greedy- type algorithms from 
nonlinear approximation for finding sparse solutions of convex optimization 
problems. We modified in [13] three the most popular in nonlinear approx- 
imation in Banach spaces greedy algorithms - Weak Chebyshev Greedy Al- 
gorithm, Weak Greedy Algorithm with Free Relaxation and Weak Relaxed 
Greedy Algorithm - for solving convex optimization problems. We continue 
to study sparse approximate solutions to convex optimization problems. We 
apply the technique developed in nonlinear approximation known under the 
name of greedy approximation. A typical problem of convex optimization is 
to find an approximate solution to the problem 



under assumption that E is a convex function. Usually, in convex optimiza- 
tion function E is defined on a finite dimensional space lR n (see [2], [6]). 
Recent needs of numerical analysis call for consideration of the above opti- 
mization problem on an infinite dimensional space, for instance, a space of 
continuous functions. Thus, we consider a convex function E defined on a 
Banach space X. It is pointed out in [15j that in many engineering applica- 
tions researchers are interested in an approximate solution of problem (II. ip 
as a linear combination of elements from a given system T> of elements. There 
is an increasing interest in building such sparse approximate solutions using 
different greedy- type algorithms (see, for instance, [15], [7], [3],[T1], and [T3]). 
The problem of approximation of a given element / G X by linear combina- 
tions of elements from T> is well studied in nonlinear approximation theory 
(see, for instance [I], [UJ, [T2] ). Many of known greedy- type algorithms pro- 
vide such approximation in a form of expansion of a given element into a 
series with respect to a given dictionary T>. In the paper [13] we showed how 
some of the greedy algorithms that provide good approximation, but not 
an expansion, can be adjusted for solving an optimization problem. In this 
paper we concentrate on greedy algorithms that provide expansions, which 
means that the approximant at the mth iteration is equal to the sum of 
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the approximant from the previous iteration ((m — l)th iteration) and one 
element from the dictionary with an appropriate coefficient. 

We point out that at a first glance the setting of a problem of expansion of 
a given element and the setting of the expansion problem in an optimization 
are very different. However, it turns out that the same technique can be used 
for solving both problems. We show how the technique developed in nonlinear 
approximation theory, in particular, the greedy expansions technique can be 
adjusted for finding a sparse solution of an optimization problem (11. ip given 
by an expansion with respect to a given dictionary T>. 

We begin with a brief description of greedy expansion methods in Banach 
spaces. Let X be a Banach space with norm || • || . We say that a set of elements 
(functions) T> from X is a dictionary, respectively, symmetric dictionary, if 
each g G T> has norm bounded by one (\\g\\ < 1), 

g G V implies — g G V, 

and the closure of spanP is X. In this paper symmetric dictionaries are 
considered. We denote the closure (in X) of the convex hull of T> by A\{T>\ 
For a nonzero element / G X we let Ff denote a norming (peak) functional 
for /: 

\\F f \\ = l, F f (f) = \\f\\. 

The existence of such a functional is guaranteed by Hahn-Banach theorem. 
We assume that the set 

D :={x: E(x) < E(0)} 

is bounded. For a bounded set S define the modulus of smoothness of E on 
S as follows 

p{E,u) := p(E,S,u) :=- sup \E{x + uy) + E{x - uy) - 2E{x)\. (1.2) 

2 EgS, 11^1=1 

We assume that E is Frechet differentiable. Then convexity of E implies 
that for any x, y 

E(y)>E{x) + (E'{x),y-x) (1.3) 

or, in other words, 

E{x)-E{y) < (E'{x),x-y) = (-E'{x),y - x). (1.4) 
We will often use the following simple lemma. 
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Lemma 1.1. Let E be Frechet differentiable convex function. Then the 
following inequality holds for x G S 

< E(x + uy) - E{x) - u(E'{x),y) < 2p(E, S,u\\y\\). (1.5) 

Proof. The left inequality follows directly from (11. 3p . Next, from the defini- 
tion of modulus of smoothness it follows that 

E(x + uy) + E(x - uy) < 2{E{x) + p{E, S,u\\y\\)). (1.6) 

Inequality (II. 3p gives 

E{x-uy) > E{x) + (E'{x),-uy) = E{x) -u(E'(x),y). (1.7) 

Combining (II. 6p and (II. 7p . we obtain 

E(x + uy) < E{x) +u(E'{x),y) + 2p(E, S,u\\y\\). 

This proves the second inequality. □ 

From the definition of a dictionary it follows that any element / G X 
can be approximated arbitrarily well by finite linear combinations of the 
dictionary elements. The primary goal of greedy expansion theory is to 
study representations of an element / G X by a series 

oo 

9j(f)eV, Cj (f)>0, j = 1,2,.... (1.8) 

3=1 

In building the representation (II. 8p we should construct two sequences: 
{<7j(/)}?Li and {cj(f)}°? =1 . In greedy expansion theory the construction of 
{9j(f)}j^=i i s based on ideas used in greedy- type nonlinear approximation 
(greedy-type algorithms). This justifies the use of the term greedy expansion 
for (11.81) . The construction of {gj(f)} ! j^ = i is, clearly, the most important 
and difficult part in building the representation (11.81) . On the basis of the 
contemporary theory of nonlinear approximation with respect to redundant 
dictionaries, we may conclude that the method of using a norming functional 
in greedy steps of an algorithm is the most productive in approximation in 
Banach spaces. 
Denote 

r v (f) := supllF/Hi, := sup sup F f (g). 

F f F f g&T> 
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We note that, in general, a norming functional Fj is not unique. This is why 
we take sup F/ over all norming functionals of / in the definition of r-p(f). It 
is known that in the case of uniformly smooth Banach spaces (our primary 
object here) the norming functional Ff is unique. In such a case we do not 
need sup F/ in the definition of rx>(f), we have rv{f) = \\Ff\\v- 

We begin with a description of a general scheme that provides an expan- 
sion for a given element /. Later, specifying this general scheme, we will 
obtain different methods of expansion. 

Dual-Based Expansion (DBE). Let t G (0,1] and / ^ 0. Denote 
fo := f. Assume {fj}™^ C X, {y?.,}™^ 1 C V and a set of coefficients 
{cj}™^ 1 of expansion have already been constructed. If f m -i = then we 
stop (set Cj = 0, j = m, m + 1, . . . in the expansion) and get / = Y^=i c jfj- 
If f m -i 7^ then we conduct the following two steps. 

(1) Choose (p m G V such that 

SU P Ffm-AVm) > fruCfm-l)- 

F fm-l 

(2) Define 

fm ■ fm—1 Cm^Pmi 

where c m > is a coefficient either prescribed in advance or chosen from a 
concrete approximation procedure. 
We call the series 

oo 

i=i 

the Dual-Based Expansion of / with coefficients Cj(f) := Cj, j = 1,2,... 
with respect to V. 
Denote 

m 

S m (f,V) := ^Cjifj. 

3=1 

Then it is clear that 

fm = f-S m (f,V). 

The reader can find some convergence results for the DBE in Sections 6.7.2 
and 6.7.3 of [12]. 

Let C := {c m }^ =1 be a fixed sequence of positive numbers. We restrict 
ourselves to positive numbers because of the symmetry of the dictionary T>. 
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X-Greedy Algorithm with coefficients C (XGA(C)). We define 
/o := /, Go := 0. Then, for each m > 1 we have the following inductive 
definition. 

(1) ip m G V is such that (assuming existence) 

\\fm-i - c m ip m \\ x = inf \\fm-i - c m g\\ x - 
gev 

(2) Let 

fm • fm—1 ^-mSPmi G m . G m ^\ -\- C m (p m . 

Dual Greedy Algorithm with weakness r and coefficients C 
(DGA(t, C)). Let r := {t m }™ =1 , t m G [0,1], be a weakness sequence. We 
define fo ■— f, Go '■— 0. Then, for each m > 1 we have the following inductive 
definition. 

(1) ip m EV is any element satisfying 

Ff m -A<Pm) > *m||-P> m -i||0). 

(2) Let 

fm • fm—1 Cm^Pmi G m . G m —\ ~\~ C m (p m . 

In the case r = {t}, t e (0, 1], we write t instead of r in the notation. 
It is easy to see that for any Banach space X its modulus of smoothness 
p(u) is an even convex function satisfying the inequalities 

max(0, u — 1) < p{u) < u, u G (0, oo). 

In Section 6.7.3 of [12] we considered a variant of the Dual-Based Ex- 
pansion with coefficients chosen by a certain simple rule. The rule depends 
on two numerical parameters, t G (0, 1] (the weakness parameter from the 
definition of the DBE) and b G (0, 1) (the tuning parameter of the approxi- 
mation method). The rule also depends on a majorant \x of the modulus of 
smoothness of the Banach space X. 

Let X be a uniformly smooth Banach space with modulus of smoothness 
p(u), and let p(u) be a continuous majorant of p(u): p{u) < p(u), u G [0, oo) 
such that p{u)/u goes to monotonically. It is clear that p(2) > 1. 

Dual Greedy Algorithm with parameters (t,b,p) (DGA(t, b, //)). 
Let X and p{u) be as above. For parameters t G (0, 1], b G (0, 1] we define 
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sequences {/ m }^ =0 , {y? m }™ =1 , {c m }~ =1 inductively. Let /„ := /. If for 
m > 1 f m -i = then we set fj = for j > m and stop. If f m -i ^ then we 
conduct the following three steps. 

(1) Take any cp m G V such that 

(2) Choose c m > from the equation 

||/ m _i||/i(c m /||/ m _i||) = ^c m r p (/ m _!). 

(3) Define 

fm ■ fm—1 C m (p m . 

We note that (2) is equivalent to solving the equation 

lljCm/Wfm^W) _tb 

r /\\f \T - IT^Um-l)- 
u m/ || J m—1 1| ^ 

It follows from the definitions oft,b and rx>{f m -i) that the right hand side of 
the above equation is < 1/2. Therefore, there always exists a unique solution 
to this equation and it satisfies the inequality 

Cm/ll/m-lll < 2. 

For illustration we present two theorems on convergence and rate of con- 
vergence of the DGA(r, b, //) (see Section 6.7.3 of fT2]). 

Theorem 1.1. Let X be a uniformly smooth Banach space with the modulus 
of smoothness p(u) and let fi(u) be a continuous majorant of p(u) with the 
property p{u)/u \. as u — > +0. Then, for any t G (0, 1] and b G (0, 1) the 
DGA(t, b, fx) converges for each dictionary D and all f G X. 

Theorem 1.2. Assume X has a modulus of smoothness p(u) < r yu q , q G 
(1,2] and b G (0,1). Denote p{u) = •yu q . Then, for any dictionary V and 
any f G A 1 (V), the rate of convergence of the DGA(t,b, p) is given by 



\\fm\\ < C(t,b,j,q)m , p:= 
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We now formulate the corresponding generalizations of the above algo- 
rithms to the case of smooth convex function E. Denote 

E v (x) := sup{-E'(x),g). 

Gradient Based Expansion. Let t G (0,1]. Denote G :— 0. Assume 
{Gj}J=Q C X, {(pj}™^ 1 C V and a set of coefficients {cj}J~^ of expansion 
have already been constructed. If E'(G m -i) = then we stop (set Cj = 0, 
j — m, m + 1, . . . in the expansion). If E'(G m -i) ^ then we conduct the 
following two steps. 

(1) Choose ip m G T> such that 

(-E'(G m ^ 1 ),if m ) > tE v (G m -i). 

(2) Define 

G m ■ C m _i -|- c m ip m , 

where c m > is a coefficient either prescribed in advance or chosen from a 
concrete approximation procedure. 
We call the series 

oo 

J=l 

the Gradient Based Expansion with coefficients Cj, j = 1,2,... with respect 
to V. 

Let C := {c m }~ =1 be a fixed sequence of positive numbers. We restrict 
ourselves to positive numbers because of the symmetry of the dictionary D. 

E-Greedy Algorithm with coefficients C (EGA(C)). We define 
Go := 0. Then, for each m > 1 we have the following inductive definition. 

(1) (p m G V is such that (assuming existence) 

) = inf E(G m _ l + c m g). 
gev 

(2) Let 

G m ■ G m —\ -\- c m ip m . 

Gradient Greedy Algorithm with weakness r and coefficients C 
(GGA(t,C)). Let r := {t m }~ =1 , t m G [0,1], be a weakness sequence. We 
define G := 0. Then, for each m > 1 we have the following inductive 
definition. 
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(1) ip m G V is any element satisfying 

(—E'(G m ^i),(p m ) > t m Ev{G m -i). 

(2) Let 

G m ■ G m —\ -\- c m (p m . 

In the case r = {£}, i G (0, 1], we write t instead of r in the notation. 

Let E be a uniformly smooth convex function with modulus of smoothness 
p(E, D, u), and let p{u) be a continuous majorant of p(E, D, u): p(E, D, u) < 
p(u), u G [0, oo) such that p{u)/u goes to monotonically. 

Gradient Greedy Algorithm with parameters (r, b, p) (GGA(r, b, p)). 
Let E and p(u) be as above. For parameters r = {t k }, t k G (0, 1], b G (0, 1] 
we define sequences {G m }~ =0 , {(p m }m=n { c m}m=i inductively. Let G := 0. 
If for m > 1 E'(G m ^i) = then we stop. If E'(G m -i) ^ then we conduct 
the following three steps. 

(1) Take any ip m G V such that 

(—E'(G rn -i),(p m ) > t m Ex>(G m -i). (I'll) 

(2) Choose c m > from the equation 

A*(cm) = t ^-c m E v {G m - 1 ) (1.12) 

provided it has a solution c m > and set c m = 1 otherwise. 

(3) Define 

+ c m ip m . (1.13) 
We note that equation fll.l2p is equivalent to the equation 

p(Cm) t m b . . 
— —^-&V\S*m-l)- 

Our assumption E'(G m -\) ^ implies that Ex>{G m -i) > 0. Therefore, the 
above equation either has a solution c m > or p(u)/u < y£ p (G m _i) for 
all u. 

The greedy step (1) in the above algorithm is a standard greedy step which 
is based on E'{G m _i). The choice of the coefficient c m from equation (I1.12p 
requires knowledge of both E v {G m _i) and p{u). The quantity E v {G m -i) can 
be computed (in case X is finite dimensional and T> is finite). The function 
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fj,(u) comes from our assumption on E and may be the one which does not 
describe smoothness of E in the best way. Here is a modification of the 
GGA(r, b, fi) which does not use [i. 

Gradient E-Greedy Algorithm with parameters (t) (GEGA(t)). 
Let E be uniformly smooth convex function. For parameters r = 
t k E (0,1] we define sequences {G rn }% =0 , {</?m}m=i; {c m }~=i inductively. 
Let G := 0. If for m > 1 E'(G m ^) = then we stop. If S'(G m _i) ^ then 
we conduct the following three steps. 

(1) Take any (p m e T> such that 

(— E\G m -\), ip m ) > t m E v (G m -i) 

(2) Choose c m from the equation 

E(G m -i + c m (p m ) = mm.E(G m - 1 + ap m ). (1.15) 

c 

(3) Define 

G m • G m —\ + c m (p m . (1.16) 

Our main interest in this paper is in analysis of greedy constructions 
of sparse approximants for convex optimization problems with respect to 
an arbitrary dictionary V. We now give a comment that relates the above 
algorithms to classical gradient-type algorithms and thus justifies the use of 
the term gradient in the names of these algorithms. We specify our dictionary 
V to be the unit sphere S := {g E X : ||g|| = 1} of the space X. Then 

E v {x) = \\E\x)\\ x *. 

Therefore, the greedy step from the Gradient Based Expansion, the GGA(r, C), 
and the GGA(r, b, /i) takes the form: choose (f m G T> such that 

(— E (G m _i), ip m ) > t rn \\E (G m _i) \\x*- 

In particular, when X = M. n equipped with Euclidean norm and t m = 1 we 
obtain 

Vm = -E'{G m ^)/\\E'{G m ^)h 

is a unit vector in the direction opposite to the gradient £'(G m _i) direction. 
In this case the GGA({1}, b, fi) with /j,(u) = ^u 2 chooses the step size c m 
from the equation 

7 <4 = h -c m \\E\G m ^)\\ 2 =}► c m = ^-\\E'(G m ^)\\ 2 . 



(1.14) 
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Thus ^ 

G m G'rn—1 ^mSPm G m — \ — E {G m —\ / 

which describes the classical Gradient Method. 



2 Convergence of the Gradient Based Expan- 
sion 

In this section we assume that the sets 

D c := {x : E(x) < E(0) + C} 
are bounded for all finite C and that for any bounded set Q we have 

supp'OOIIx* < 00. (2.1) 

We begin with the following lemma 

Lemma 2.1. Let E be Frechet differentiable convex function satisfying the 
above assumptions. Assume that the approximants {Gj}°Z and coefficients 
{cj}^ =1 from the Gradient Based Expansion satisfy the following two condi- 
tions 

00 

^2 CjE v (G j) < 00, (2.2) 



Then 



5> = oo. (2.3) 



lim inf E(G m ) = inf E(x). (2.4) 



Proof. By fTOj) 

E{G m ) — £(G m _i) < (E'(G m ), G m — G m -i) = c m {E\G m ) } (p m )- 
This implies 

E{G m ) < E(G m _i) + c m E v (G m ). 
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1 S r 
n=l 



oo 



Using our assumption (j2.2j) we obtain 

m 

£(G m ) < £(0) + ^CjEviGj) < E(0) + d. 
By (12. ip we get from here for all m 

II^GJIU* <c 2 . 

Denote s n := YTj=i c j- Then (I2.3p implies (see [Tj, p. 904) that 

oo 

£- = °c. (2.5) 

Using (E2D) we get 

oo 

22 s nEv{G n )— = ^2c n E v (G n ) < oo. 

1 -, 
n=l n=l 

Thus, by (I2"3|l 

liminf s n E v (G n ) = 0. 

n— >oo 

Let 

lim s nk E v {G nk ) = 0. (2.6) 

Consider {£"(G nfe )}. A closed bounded set in the dual X* is weakly* com- 
pact (see [5], p. 45). Let {Fi}^, Fi := —E'(G nk ) be a u>*-convergent 
subsequence. Denote 

F := u>*- lim F{. 

i—toc 

We complete the proof of Lemma [2TT1 by contradiction. We assume that (12.41) 
does not hold, that is, there exist a > and N E N such that 

E(G m ) - inf £(x) > 2a, m > JV, (2.7) 

and then derive a contradiction. 

We begin by deducing from ( 12. 7p that F / 0. Indeed, by ( 12. 7ft there 
exists f E D such that 

E{G m ) — E(f) > a, m>N. (2.8) 
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By fll.4p we obtain 

(—E'(G m ),f - G m ) > E(G m ) - E{f) > a. (2.9) 

Next, we have 

(F,f)=]im{FiJ) (2.10) 

and 

n k ■ 

m,G nki }\ = \(F u ^c m }\ 
i=i 

n fc . 

= |$>,(i^>| < s„ fci ^(G? njkj ) -> (2.11) 
i=l 

for i -»■ oo. Relations ( 12TTU]) . f l2"TTTj) and 021]) imply that (F, /) > a, and 
hence F / 0. This implies that there exists g E T> for which (F,g) > 0. 
However, 

(F,<?) = \im (Fi,g) < lim E D (G rih .) = 0. 
We have a contradiction, which completes the proof of Lemma 12.11 □ 

3 Convergence of GGA(r, C) and EGA(C) 

We begin with a simple lemma. 
Lemma 3.1. Let f , A > 0, be such that 

f/AeA 1 (V). 

Then for 

k 

G k := ^ W> <Pj e V, 3 = 1, • • • , k, 

3=1 

we have 



k 

Ev{G k ) > (E(G k ) - E(f))/(A + A k ), A k := ^ 

3=1 
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Proof. We have by 

(-£'(G fe ), / - G fe ) > E{G k ) - E(f). (3.1) 

Next, 

\(-E'(G k )J)\<AE v (G k ), (3.2) 

fc 

K-^Gfc),^*)! <^(G fc )^l c ii- (3-3) 

3=1 

Inequalities (j3.ip — (13.3p imply the statement of Lemma 13.11 

□ 

We now proceed to a convergence result for general uniformly smooth 
convex function E. 

Theorem 3.1. Let E be a uniformly smooth convex function satisfying 

E(x + uy) - E(x) - u(E'(x), y) < 2fi(u), (3.4) 

for x G D 2 , \\y\\ = 1, \u\ < 1 with /j,(u) = o{u) as u — > 0. Assume that the 
coefficients sequence C := {cj}, Cj G [0, 1] satisfies the conditions 



J^c fc = oo, (3.5) 



k=l 



$>(<*)<!■ (3.6) 



k=l 

Then for the GGA(t,C) and for the EGA(C) we have for each dictionary V 

lim E(G m ) = inf E(x). 

Proof. We give here a proof that works for both algorithms from Theorem 
13.11 Let G m _i be an approximate solution after m — 1 iterations of either 
the GGA(i, C) or the EGA(C). Let (p m be such that 

(—E'(G m -i), ip m ) > tE v {G m ^). (3.7) 

Then 

inf E(G m ^ 1 + c m g) < E(G m ^ 1 + c m (p m ). 
g ev 
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Thus, in both cases (GGA(t, C) and EGA(C)) it is sufficient to estimate 
E{G m _i + c m ip m ) with ip m satisfying (13.71) . By (13. 4p under assumption that 
G m _i G D 2 we get 

E(G m ^ 1 + c m (p m ) < E(G m _i) + c m (E'(G m -i), Vm) + 2/i(c m ). 

Using definition of ip m we continue 

< £(G m -i) - c m tE v {G m ^) + 2/x(c m ). (3.8) 

We now prove by induction that G m G -D2 for all m. Indeed, clearly G G 
D 2 . Suppose that Gf. G D 2 , k = 0,1,..., m — 1, then (13. 8 p holds for all 
k = 1, . . . , m instead of m and, therefore, 

m 

E{G m ) < E(0) + 2 < £?(0) + 2 

fc=i 

which implies that G m G Z?2- 

Let / £ , e > 0, A(e) > 0, be such that 

£(/ e ) - 6 < e, / e M(e) G 6 := inf E(x). 

Applying Lemma [3.11 we obtain from (13. 8p (with := ^2j=i c j) 

S(G m _! + c m v m ) < E(G m _i) - tc rn{E{G 1) - b - e) + ^ 

A(e) + A m _i 

Denote 

a n := £(G n ) - b - e. 

By (I3.9P we obtain 

(l-0 m )+2//(c m )- (3-10) 

with 

p. </L -m 

m := A(e)+An-i" 
We note that our assumption (13. 5p implies that 

00 

^0 m = oo. (3.11) 

m=l 
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Without loss of generality we can assume that A(e) > 1. Then 8 m < 1 and 
we get from ( 13.1 Op 



m 

*m < «0 ~ + 2 M c l) II^ 1 - ^i) + • • • + 2/i(c m _!)(l - m ) + 2/i( Cm ). 

3=1 3=2 

(3.12) 

The properties (13. lip and fi(c m ) < oo imply that 

limsup a m < 0. 

This completes the proof of Theorem 13.11 □ 

4 Rate of convergence of GGA(r, C) and EGA(C) 

In this section we consider the GGA(t, C) and the EGA(C) with a specific 
sequence C. For a special C we prove the rate of convergence results for the 
uniformly smooth convex functions with modulus of smoothness p(E, u) < 
7 ««, qe (1,2]. 

Theorem 4.1. Let E be a uniformly smooth convex function with modu- 
lus of smoothness p(E,u) < •yu q , q G (1,2] on D 2 . We set s := |±1 and 
C s := {c/c~ s }^i 1 TOi/i c chosen in such a way that lc q Y^k=i^~ sq — 1- 27ien 
£/ie GGA(t,C s ) and EGA(C S ) (for this algorithm t = 1) converge with the 
following rate: for any r G (0, i(l — s)) 

E(G m ) - inf £(z) < C(r, t, g, 7, M)m" r . 

Proof. In the same way as in the proof of Theorem [XT] we prove that G m G D 2 
for all m. Then we use inequality (I3.9P proved in Section 3. Let f £ , e > 0, 
M > 0, be such that 

E(D - b < e, f/M G ^(X?), 6 := inf £(x). 

IfeAi(D) 

Using the assumption f e /M G Ai(X>), we write ( 13. 9 p with A(e) = M 

E(G m -i + c m <p m ) < E(G m -i) - tUE uT^~ e) + 27C -- ^ 
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We have 

m— 1 



A m _i = c 

and 



k ' S < c(l + / z~ s dx) = c(l + (1 - s^m 1 " 8 - 1))) 

k=l ^ 1 



M + A™-i < M + c(l - s^m 1 -*. 
Therefore, for m > N we have with t> := (r + £(1 — s))/2 

fe m > u + - s) 2 
M + An-i 2m 1 ' J 

We need the following technical lemma. This lemma is a more general version 
of Lemma 2.1 from [8] (see also Remark 5.1 in [10] and Lemma 2.37 on p. 
106 of [12]). 



Lemma 4.1. Let four positive numbers a < (3 < 1, A, U £ N be given and 

let a sequence {a n }J^ 1 have the following properties: at < A and we have for 
all n > 2 

an<a n _i + A(n-l)- a ; (4.3) 
if for some v > U we have 

a v > Av~ a 

then 

a u+1 <a„(l-f3/v). (4.4) 

Then there exists a constant C = C(a, (3, A, U) such that for all n = 1, 2, . . . 
we have 

a n < Cn~ a . 

We apply this lemma with a n := E{G n ) — b — e, a := r, (3 := v : = 
(r + £(1 — s))/2, U = N and A specified later. Let us check the conditions 
(14.31) and (14. 4p of Lemma 14.11 By the inequality 



E(G m ) < E(G m -i) + 2p{E, c m ) < E{G m -i) + 1ic q m 



sq 



the condition (14. 3 p holds for A > 2 r yc q . Assume that a m > Am r . Then 
using sq > 1 + r we get 



c q m = r-m 



q m- sq < c q wT x - r . (4.5) 
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Setting A to be big enough to satisfy 

7 m ~ 2m^ 
we obtain from (pLI"]) . fTOj) . and 

a m +i < a m (l - P/m) 
provided a m > Am~ r . Thus ( 14 .4p holds. Applying Lemma I4TT1 we get 

«m < C(r,t,q,'j,M)m~ r . 

□ 

We note that in the special case when T> is the unit sphere S of X the 
rate of convergence in Theorem 14.11 can be improved. 

Theorem 4.2. Let E be a uniformly smooth convex function with modulus of 
smoothness p(E,u) < r yu q , q G (1,2] on D 2 which we assume to be bounded. 
For a 5 G (0, 1) we set s := 1 — 5 and C s := {c/c -5 }^ with c := c(8) chosen 
in such a way that ^c q YlT=i k~ sq — 1- Suppose V = S. Then the GGA(t, C s ) 
and EGA(C S ) (for this algorithm t = 1) converge with the following rate: 



E(G m ) - inf E(x) < C{E,5,q,i,t)rri 



-s(q-l) 



Proof. As we already mentioned in the Introduction in the case T> = S we 
have 

E v (x) = \\E'(x)\\ x *. 
By (13 .4p under assumption that G m _i G D 2 we get 

E{G m - 1 + c m ip m ) < E(G m -t) + c m (E'(G m -i), <p m ) + 2j(c m ) q . 

Using definition of (p m we continue 

< E(G m - 1 )-c m t\\E , (G m - 1 )\\ x .+2rtc m )*. (4.6) 

As in the proof of Theorem 13.11 we derive from here that G m G D 2 for all m. 
Using notation a m := E(G m ) — inf^u E(x) we obtain 

a m _! = sup(£(G m _i) - E(f)) < \\E\G m ^)\\ x * sup ||G m _ a - /||. (4.7) 

feD f£D 
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Inequality ( 14. 7p and our assumption that D 2 is bounded imply 

\\E (G m -l) || JSC* > 0-m-l/C\. 

Substituting this bound into f!4.6[) we get 

a m < a m -i (1 - tc m C^) + 2 7 (c m ) 9 . (4.8) 

As in the proof of Theorem 14.11 we use Lemma [4.11 It is clear that for a m _i 
satisfying 

a m -i > Am- s{q - l) 

with large enough A we have 

a m -itc m C{ 1 > 47(c m ) 9 . 
Therefore, (14. 8[) gives in this case 

d m (l m —\ 

fl - (4-9) 



2d 

It follows from the definition of c m that 

te m s(g - 1) + 1 

— — > tor m> U. 

2C X ~ m - 1 

Thus by Lemma [4.11 we obtain 

a m <C(E,S,q, 7 ,t)m- s ^ 
which proves Theorem 14.21 □ 



5 Convergence and rate of convergence of the 
GGA(r, 6, y) 

We begin with a convergence result. 

Theorem 5.1. Let E be a uniformly smooth convex function with the mod- 
ulus of smoothness p(E, D, u) and let p{u) be a continuous majorant of 
p(E, D, u) with the property p{u)/u J, as u — > +0. Assume that for x e D 

\\E'(x)\\ x *<C D . 

Then, for any t e (0, 1] and b G (0, 1) we have for the GGA(t,b, p) 

lim E{G m ) = inf E{x). (5.1) 
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Proof. In this case r = {£}, t G (0, 1]. We have by ( 13. 4 p under assumption 
that G m _i G D 

E(G m _i + C m <y? m ) < S(G m -i) + c m (S'(G m _i), y> m ) + 2//(c m ). 

Using definition of y? m we continue 

< £(G m _i) - c m tE v (G m ^) + 2/i(c m ). (5.2) 

Using the choice of c m we find 

E(G m ) < E(G m -i) - t(l - b)c m E v {G m ^). (5.3) 

In particular, (15. 3p implies that {E{G m )} is a monotone decreasing sequence 
and therefore our assumption that G m _i G -D implies that G m G -D. Clearly, 
Go G D. Thus we obtain that G m G -D for all m. Also, (15. 3p implies that 

t(l - b)c m E v {G m ^) < E(G m -i) - E{G m ). 

Thus 

oo 

c m E v (G m _i) < oo. (5.4) 

m=l 

We have the following two cases: 

oo oo 

(-0 Cm = 00 ' C m < OO. 

m=l m=l 

First, we consider case (I). Our argument here is as in Lemma [2.1[ Denote 
s n := X]j=i c j- Then our assumption implies (see [T], p. 904) that 

oo 

£- = °°- (5-5) 

n=l 

Using (15. 4p ) we get 

oo oo 

^2 s n E v (G n -i)— = s ^2c n E v (G n -i) < oo. 

n=l Sn n=l 

Thus, by ([53]) 

liminf s n E X) (G ri _i) = 0. 

n— >oo 

20 



Clearly, the above relation implies 

liminf s n E v (G n ) = 0. 

n— >oo 

The rest of the proof in this case repeats the corresponding part from the 
proof of Lemma 12.11 As a result we obtain 

liminf E(G m ) = inf E(x). 
m— s-0 xeD 

Monotonicity of {E(G m )} implies that we can replace liminf by lim in the 
above relation. 

Second, we consider the case (II). Our assumption implies that c m — > 
as m — > oo. From the definition (I1.12p of c m we obtain 

2 

Ev{G m -i) = —fi{c m )/c m ^ 0, m^oo. (5.6) 
to 

We show that relation (I5.6P implies the following two properties f !5.7p and 

\im(E'(G m ),G m ) = 0, (5.7) 

m— >0 



\im(E'(G m )J}=0. (5.8) 

m— >0 



Indeed, for (15. 7p we have 



|(£'(G m ),G m >| = | ^(-E"(G m ), v^i)cj | < E v (G m )J2 c j "> °- 

We now prove ( 15. 8p . For arbitrary e > find f € such that 

||/-/ e ||<e, / e M( e ) e Ai(P), 

with some A(e). Then 

KE'(G m ),/)| = \{E'(G m ),f e ) + (E'(G m ),f-r)\ < E v (G m )A(e) + C D e. 

We complete the proof of case (II) by contradiction. We assume that (15.11) 
does not hold, that is, there exist a > and N e N such that 

E(G m ) - inf £(x) > 2a, m > iV, (5.9) 
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and then derive a contradiction. By (15. 9p there exists / £ D such that 

E(G m ) — E(f) > a, m>N. (5.10) 

By dUl]) we obtain 

(—E'(G m ), f - G m ) > E(G m ) - E(f) > a. (5.11) 
This contradicts to (15 .7p and (15. 8p . 

□ 

Theorem 5.2. Let E be a uniformly smooth convex function. Assume that 
for x £ D 

\\E\x)\\ x *<C D . 
Then, for any t £ (0, 1] we have for the GEGA({t} ) 

lim E(G m ) = inf E(x). (5.12) 

Proof. Let i£ be a uniformly smooth convex function with the modulus of 
smoothness p(E,D,u) and let p(u) be a continuous majorant of p(E,D,u) 
with the property p{u)/u 4- as u — )■ +0. As in (15.31) we obtain 

E(G m ) < £(G m _! + 4<^ m ) < ^(G^) - t(l - b)c r m E v (G m ^) (5.13) 

with chosen from the equation 

with some fixed 6 £ (0, 1). 

The proof of Theorem 15.11 used only assumptions on E and analogs of 
relations (I5.13P and (I5.14p . Therefore the same proof gives (I5.12p . □ 

We proceed to study the rate of convergence of the GGA(r, b, /i) for the 
uniformly smooth convex function with the power-type majorant of modulus 
of smoothness: p(E,D,u) < p{u) = •yu q , 1 < q < 2. 

Theorem 5.3. Let t := {t^}^ be a nonincreasing sequence 1 > t\ > 
t 2 ■ ■ ■ > and b £ (0, 1). Assume that uniformly smooth convex function E 
has a modulus of smoothness p(E,D,u) < -yu q , q £ (1,2]. Denote p{u) = 
•yu g . Then the rate of convergence of the GGA(r,b, p) is given by 



-» t m (l-b)(q-l) 

E(G m ) - inf E(x)<C(b,j,q)(l + J2€r q+t ^~ b) , 
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Proof. As in (15. 3p . we get 

E{G m ) < E(G m -x) - t m (l - b)c m E v {G m „ x ). (5.15) 

Thus we need to estimate c m £j)(G m _i) from below. Denote b n := 1+X)?=i c r 
By Lemma [3. II we obtain 

Ev{G m -\) > (.B(G m _i) -w)/b m _x, w:= inf £(x). (5.16) 

rr6Ai(£>) 

Substituting (15.161) into (15.151) and using notation a m := E(G m ) — w we get 
a m < a m _i(l - t m (l - b)c m /b m ^ x ). (5.17) 
From the definition of b m we find 

b m 6 m „x -|- c m b m —\{\ -\- c m /b m _x) ■ 
Using the inequality 

{l + x) a <l + ax, < a < 1, x>0, 

we obtain 

bt {l - b) < b%?{*\l + t m (l - b)cjb m ^). (5.18) 
Multiplying (I5.17P and ( I5.18p . and using that t m < t m _i, we get 

a m bt {1 - b) < a m _!fc l(1 - fe) < a . (5.19) 

The function n{u)/u = •yu g ~ 1 is increasing on [0, oo). Therefore the c m is 
greater than or equal to c' m from (see (15.161) ) 

l(c'J q = ^c'^x/b^x, (5.20) 



i 



27/ \b m -X 
Using notations 
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we obtain 

a m < a m _! f 1 - | , (5.22) 
from (I5.17P and (I5.2ip . Noting that b m > 5 m _i, we infer from (I5.22p that 

a rn ^ a m-l J ]_ _ Q rn-1 1 /g 
^™ Ka-l \ A Ki-1 

We obtain from ( 15. 23ft by an analog of Lemma 2.16 from Chapter 2 of [12 
(see [9], Lemma 3.1) 

i / \ -l 



m 



fc=i 



Combining ( 15. 19j) and (I5.24p . we get 

m 

^ t m (l-b)(q-l) 

a m <C(E,b,j,q)(l + JJt|r 9+tm(1 - 6) > P , 

fc=i ^ 

This completes the proof of Theorem 15.31 □ 

We note that in the special case when T> is the unit sphere S of X the 
rate of convergence in Theorem 15.31 can be improved. 

Theorem 5.4. Let r := {t^^Lx be a weakness sequence tk G [0, 1] and 

b G (0, 1). Assume that uniformly smooth convex function E has a modulus 
of smoothness p(E,D,u) < 'yu' 3 , q G (1,2]. Denote li(u) = / ~fu q . Suppose 
that T> — S. Then the rate of convergence of the GGA(t, b, li) is given by 

E{G m )-ME{x)<C{E,b, 1 ,q)\l + Ys e A > P := ~^T' ( 5 ' 25 ) 

Proof. As we already mentioned in the Introduction in the case T> = S we 
have 

E v (x) = \\E'(x)\\ x *. 
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For ji{u) = r yu q we obtain for the c m 

i 

ici = ^h^g^oiu* =► Cm = r^\\E'(G m ^)\\ x ?j 91 . 

Therefore, by (I5.15P we get 

E(G m ) < £(G m -i) - t m (l ~ b) {^\\E\G m ^)\\ x ^j \\E'(G m ^)\\ x ,. 

(5.26) 

Equation (I5.26P implies that G m G D for all m. Using the notation a m : = 
E(G m ) — w, w := mi xeD E(x) we obtain 

a m _! = sup( J E;(G m _ 1 ) - E{f)) < \\E'(G m ^)\\ x * sup \\G m ^ - f\\. (5.27) 
feD feD 

Inequality (15.271) and our assumption that D is bounded imply 

\\E (G m -i)\\ x * > a m _i/Ci. 
Substituting this bound into (I5.26P we get 

a m < a m _! (\ - t p m a^l x C^j . (5.28) 

Inequality flpgp is similar to We derive (I5T25D from f[5T2gj) in the 

same way as (I5.24p was derived from (15.221) . □ 
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